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S.  M.  Ross,  University  of  California,  Berkeley 
0.  Introduction 

This  paper  reconsiders  the  classical  model  for  selling  an 
asset  in  which  offers  come  in  daily  and  a decision  must  then  be 
made  as  to  whether  or  not  to  sell.  For  each  day  the  item  remains 
unsold  a continuation  (or  maintenance  cost)  c is  incurred.  The 
successive  offers  are  assumed  to  be  independent  and  identically 
distributed  random  variables  having  an  unknown  distribution  F. 
The  model  is  considered  both  in  the  case  where  c an  offer  is 
rejected  it  may  not  be  recalled  at  a later  ti;  ■ in  the  case 

where  such  recall  of  previous  offers  is  allowed. 

In  Section  1 we  show  how  bounds  on  the  optimal  policy  may 
be  obtained  when  some  partial  information  about  F is  available. 
In  particular,  we  show  that  if  F,  the  distribution  of  offers, 
satisfies  the  NWUE  (new  worse  than  used  in  expection)  property 
defined  as 


E [X  - a | X > a]  > E„[X]  for  all  a > 0, 

t — F — 
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thin  flio  optimal  policy  has  a mono  tonic  relationship  with  the 


optimal  policy  in  the  case  where  the  ilistr ibution  ot  offers  is 
exponential  with  the  same  mean  as  F. 

In  Sections  2 and  3 we  consider  a Bayesian  version  of  this 
model  by  supposing  that  F is  known  to  be  one  of  the  distributions 
F^ , F^,  . ..,  with  given  initial  prior  probabilities.  In 
Section  2 we  do  not  allow,  and  in  Section  3 we  do  allow,  the  recall 
of  old  offers.  In  both  cases  we  provide  bounds  on  the  optimal  policy 
in  terms  of  the  optimal  policies  in  the  case  where  it  is  known 
which  of  the  F.  is  equal  to  F.  This  Bayesian  format  has  previously 
been  considered  in  [ 2 ] which  assumed  that  F was  a normal  random 
variable  with  known  variance  and  imposed  a norma]  prior  distribution 
on  the  mean  of  F.  As  our  model  imposes  no  parametric  condition 
on  F in  the  prior  distribution,  the  type  of  results  we  obtain  are 
somewhat  different  than  those  in  [ 2 ]. 


1.  Independent  and  Identically  Distributed  Offers  from  an  Unknown 
Distribution  with  Partial  Information 

If  the  successive  offers  were  independent  and  identically  dis- 
tributed random  variables  having  known  distribution  F,  then  it  is 


well-known  (see  |L|)  that  the  policy  that  maximizes  the  total  expected 


return,  both  with  arid  without  recall,  is  to  accept  an  otter  x if  and 
only  il'  x _>  xp,  where  xp  is  the  smallest  value  sucli  that 

t ““  *1  / 

x > / xdF(x)  - c /(I  - F(x^. ) ) 

Lxf  j / 

if  F is  continuous,  this  reduces  to 

oo 

c = / (x  - Xp)  dF (x)  . 

XF 

The  optimal  expected  return  is  xp  + c . 

We  shall  start  out  by  comparing  the  optimal  critical  number  for 
two  different  distributions.  To  begin  we  need  the  following  definition 


Definition: 

For  any  two  probability  distributions  F and  G we  say  that 

F _<  G if 
V 

/ f(x)  dF(x)  < / f(x)  dG(x) 

for  all  increasing  convex  functions  f. 

If  F and  G have  the  same  means,  then  F _<  G intuitively 

V 


means  that  F has  less  variability  than  G. 


Propos i L i on__l : 

Let  F < 0 tlien  x„  < x„ . 
v " " (l 


Proof : 


x is  the  smallest  value  satisfying 


C ' EC[(X  ‘ XG)+1  • 


Now 


Eg[(X  - x(,)  + ] > Ep  [ (X  - xf;)  + l 

since  f(x)  = (x  - x(,)+  is  an  increasing  convex  function. 
Hence, 


c Ep  [ (X  - x(,)  + ] 


implying  that 


x.. 


II 


Proposition  2 is  concerned  with  the  return  from  a nonoptimal 


policy . 


Proposition  2: 

If  x < x(  , then  the  return  from  the  policy  that  accepts  the 
first  offer  that  is  at  least  as  large  as  x has  a return  that  is  at 
least  x + c. 

Proof ; 

To  prove  l lie  above,  consider  t lie  expected  difference  between 

the  optimal  policy  that  uses  the  critical  number  Xp  and  the  above 

policy  that  uses  the  critical  number  x.  By  conditioning  on  whether 

an  offer  between  x and  x„  occurs  before  or  after  an  offer  greater 

r 

than  x„,  we  see  Lhat  the  expected  difference  is  at  most  x - x in 

i*  r 

the  former  case  (since  the  expected  return  from  the  optimal  policy 
starting  at  the  time  of  this  offer  between  x and  Xp  is  equal  to 
x + c - c = Xp)  and  it  is  0 In  the  latter  case.  Hence,  the  result 
follows . | I 

Definition: 

We  say  that  the  distribution  F,  with  F(O-)  = 0,  is  NWUE  if 

/ — — f F(x)  dx  for  all  a > 0 , 

a F(a)  6 

where  F(x)  = 1 - F(x).  (If  X is  a random  variable  having  distribution 
F,  then  the  abovi  is  equivalent  to  K|X  - n|X  > a]  > E[X].) 


Proposition  3: 

if  F is  NWUE  with  mean  y>  then 


E(y)  < F 
V 

where  E(y)  is  an  exponential  distribution  with  mean 
Proof : 

it  is  easy  to  show  that  F > G is  equivalent  to 


OO  CO 

j F(x)  dx  j'  / C(x)  dx  for  all  a . 
a a 


Thus,  we  have  to  show  that 


CO 

r 

j 

n 


F(x)  dx  > ye 


-a/y 


whenever  F is  NWUE.  By  the  definition  of  NWUE  we  have 


r dx  F(a)  , 


or  equivalently, 


F (..)  - y F(a)  = yf  (a)  , 

u y e 


6 


where 


F (x) 
M 


F (a)  = / -x~— — dx  and  f (a)  = 


d Fe(a) 
da 


are  the  equilibrium  distribution  corresponding  to  F.  Hence, 


< -i-  ’ 

Va)  - i* 

thereby  implying,  upon  integrating,  that 
-log  Fg(a)  £ a/y  , 
which  proves  the  result. 

We  are  now  ready  for  the  main  theorem  of  this  section. 

Theorem  1 : 

If  the  unknown  distribution  F is  known  to  be  NWUE  and  to  have 
mean  p then 


and  the  policy  which  accepts  the  first  offer  of  at  least  x.  has 
return  of  at  least  x + c;  where 

X = - P log(c/y)  . 


7 


Proof  : 


Ttu?  result  follows  Immediately  from  Propositions  L,  2,  and  3 
since  x * Xp  wh>n  E is  an  exponential  distribution  wiLb  mean  , . j 

Rema r k : 

One  instar,  a in  which  the  distribution  of  offers  would  be  NWUE 
is  if  there  were  many  classes  of  potential  customers  and  offers  from 
each  class  followed  an  exponential  distribution.  Thus  the  distribution 
of  offers  would  lie  a mixture  of  exponential  dislribut ions  am!  the 
degenerate  distribution  at  0 (indicating  no  offer),  and  would  thus 
be  NWUE. 


8 


2.  A Bayesian  Model  Without  Recall  of  Past  Offers 

In  this  section  we  suppose  that  if  an  offer  is  rejected  then 
it  can  never  be  accepted  in  the  future.  In  addition,  we  suppose  that 
although  the  distribution  F is  not  known  with  certainty,  we  do 
know  that  is  is  one  of  the  distributions  F^,  F^,  -••,  F , with  given 
prior  probabilities.  We  say  that  the  state  of  the  system  is  (x,  P) 
when  x is  the  present  offer  under  consideration  and  P = (P^,  •••>  Pn) 
is  the  posterior  probability  vector,  given  all  the  information  that 
we  have  accumulated  up  to  that  point  (including  the  present  offer  x) , 

as  to  which  of  the  F.  is  the  actual  distribution. 

1 

Also  define  V(x,  P)  to  equal  the  expected  return  from  this  day 
onward  given  that  the  state  today  is  (x,  P)  and  we  employ  an  optimal 
policy.  (If  we  assume,  as  we  do,  that  each  of  the  F^  has  a finite 
variance  and  c > 0 then  it  can  be  shown  as  in  [1]  that  an  optimal 
policy  exists) . 

The  optimality  equation  thus  takes  the  following  form 
V (x,  P)  = Max{x,  V(P)  - c]  , 

where  V(P),  which  represents  the  best  you  can  do  when  the  distribution 
is  chosen  by  the  prior  probability  vector  P = (P^,  • ••>  P ),  satisfies 


where 


r,. H ■ <w,  p)i «» PV 


and 


(T  P) . = Prob ( F . | P , yl 
y J 1 

_ VV1’  . 

ZV.  dF.(y) 

1 i 


Furthermore,  the  optimal  policy  accepts  the  offer  in  state  (x  P) 
if  and  only  if 


x : V(P)  - c . 


Proposition  4: 

V'(P)  is  convex  function  of  P. 


Proof : 

Recall  that  V(P)  represents  the  best  we  can  do  when  the  dis- 
tribution is  chosen  according  to  P.  Now  suppose  P = *.  P ' + (1  - A)  1’“, 
for  some  0 X < 1,  and  suppose  that  the  distribution  to  be  used  is  to 
be  chosen  according  to  the  following  two-stage  experiment.  First  we 
flip  a coin  having  probability  A of  coming  up  heads.  If  the  coin 

comes  up  heads  then  we  choose  the  distribution  according  to  the  prior 

1 2 
probability  I’  , while  if  it  comes  up  tails  then  wo  use  P . Now  if 


we  are  not  told  the  outcome  of  the  coin  flip  then  the  problem  is 


exactly  the  same  as  if  the  distribution  was  chosen  according  to  P 

and  thus,  the  best  we  can  do  is  V(P).  On  the  other  hand,  if  we  are 

to  be  told  about  the  outcome  of  the  coin  flip  then  by  conditioning 

on  the  outcome  we  see  that  our  expected  return  if  we  play  optimally 
1 2 

is  XV(_P  ) + (1  - X)  V(P  ).  Hence,  as  additional  information  can 
not  lower  our  expected  return  we  see  that 

V(P)  £ X V (JP1)  + (1  - X)  V(P2) 

and  the  result  is  proven.  j 

Let  xi  = xp  , i = 1,  ...,  n. 
i 

Corollary  1: 

V(P)  < EP . x.  + c 
— — xi 


Proof : 

This  follows  directly  from  Proposition  k since  V(0, 0, 0, 1 , 0. . . 0) 
= x + c (where  the  1 is  in  the  ith  place) . 

Proposition  5: 

If  the  present  state  is  (x,  P)  then 
(i)  if  x _>  EP^  x^  then  it  is  optimal  to  accept  x? 


11 


(ii)  if 


x < E P 
i 


/ y (y)  - c 

X 

1 - F,  (x) 

i 


then  it  is  optimal  to  reject  the  offer  x, 
(iii)  if 

cu 

x < E P,  / ydF. (y)  - c 


then  it  is  optimal  to  reject  x. 


Proof : 

(i)  If  x _>  EP^x^  then,  using  Corollary  1,  we  have 

x >_  V (P)  - c 
and  (i)  is  established. 

(ii)  Suppose  the  present  state  Is  (x,  P)  and  consider  the  policy 
that  accepts  the  first  offer  greater  than  x.  The  expected 
return  from  this  policy  is 

• P. 

. 1 

which  follows 'by  noting  that,  given  that  the  distribution  is 
f'\,  then  tlio  expected  number  of  additional  offers  that  will  be 


ydF^ (y) 


J l-F.(x) 
x i 


-F. (x) 


1 '). 


made  until  one  is  accepted  is  1/(1-F^(x)).  Clearly  if  x 
is  less  than  this  value,  then  it  cannot  be  optimal  to  accept 
the  present  offer  of  x. 

(iii)  The  proof  of  (iii)  is  similar  to  that  of  (ii)  in  that  it  con- 
siders the  return  when  in  state  (x,  P)  if  you  accept  the 
next  offer,  and  notes  that  if  this  return  is  greater  than  x 
then  x should  clearly  not  be  accepted. 

Remark: 

It  follows  from  part  (ii)  of  the  above  proposition  that  if 
x < min(x^,  ...,  x ) then  it  is  always  optimal  to  reject  x. 

Let  us  now  consider  the  special  case  where  there  are  only  two 
possible  distributions,  i.e.,  and  and  suppose  x^  £ 

In  this  case  the  state  can  be  represented  as  the  pair  (x,  P)  where 
x is  the  present  offer  and  P is  the  present  probability  (given  all 
information,  including  x,  accumulated  up  to  this  point)  that  F^  is 
the  true  distribution.  In  this  case  we  have 

Theorem  2: 

V(P)  is  an  increasing  function  of  P,  0 < P < 1. 

Proof : 

Since  V(P)  is  a convex  function  of  P(Proposition  4)  the 
result  would  follow  if  we  could  show  that 

13 


I 


[ 


Now  V(0)  - Xp  + c x.  c.  Also,  as  it  is  always  optimal  to  r.  jeci 
Fi  1 

an  offer  less  than  min(Xj,  x^)  - x,  it  follows  from  tlie  optimality 

equation  that 

V(l’)  - e * x^  for  all  1' 

which  proves  the  result. 

Thus,  when  n = 2 and  x^  < x.y,  it  is  optimal  to  accept  the 
offer  when  in  state  (x,  P)  if  and  only  it  x • h(P)  where  h(P)  V(P)  - c 
is  an  increasing  convex  function  of  P with  h(0)  = x^,  h(l)  = x . 
Furthermore,  bounds  on  h(P)  are  given  by  Proposition  5. 


Remark : 

There  does  not  appear  to  be  an  analogue  to  Theorem  2 when  there 

are  more  than  2 possible  distributions.  For  instance,  suppose  that  the 

distributions  F, , F,,  ...,  F are  stochastically  increasing  in  the 
12  n 

sense  that  F, (t)  is  nonincreasing  in  i for  eacli  t.  It  wo  define  the 
probability  vector  P to  be  greater  than  or  equal  to  the  probability 
vector  <2,  written  £ (j,  if 

j j 

} P,  < ^ Q.  for  each  j = 1 , ...  n , 

1 1 1 1 

then  we  might  hope  to  prove  that  V(P)  > V(Q).  However,  tliis  need  not  be 

14 

j 


the  case  as  is  indicated  by  the  following  example.  Suppose  puts 

all  its  weight  on  the  value  .9,  puts  all  its  weight  on  the  value  1, 

and  F^  is  the  distribution  of  a random  variable  that  takes  on  the  value 
1 with  probability  .99  and  (10)^  with  probability  .01,  and  suppose 
c = 1.  Now,  P (0,  .9,  .1)  _>  Q i (.9,  0,  .1),  but  it  turns  out  that 
\ . < V ,n. ; the  reason  being  that  under  Q it  only  takes  a single  ob- 
servation  to  determine  the  true  F^ , whereas  this  is  not  so  under  P.  || 


V 


, . Independent  Identical!  Dist  - uted 

Dist r i lint  ion  w i.th  Recall  of  Cast  I't  fers 

In  the  pr..v  ious  section  we  assumed  that  once  an  olft  r was  .cje<  Led 
by  the  decision  mater,  then  that  offer  immediately  disappears.  In  this 
section,  however,  . • con.  ider  the  same  model  a;  in  .•_•••  r , ,n  . b it  . i th 
he  except  on  tii.it  in  oft-r  remains  good  indef  i nit  ••  1 a-  d nv  be  a,  < <-pted 
at  any  time. 

It  turns  out  that  when  the  distribution  of  offers  is  known,  then 
the  optimal  police  in  tiles  case  Is  identical  to  tin  one  when  recalling 
past  offers  is  not  allowed.  That  is,  the  optimal  policy  is  to  accept 
the  first  offer  that  is  at  least  as  large  as  x and  the  expected  re- 
eurn  under  the  optimal  policy  is  x„  + c,  when  x is  as  defined  in 
Section  1. 

Consider  now  the  case  where  the  distribution  of  offers  is  one 
;>i  tiic  distributions  1'^,  ....  F , .....  . : f . ; . chot 

to  some  initial  probability  vector.  The  state  ol  the  system  >i  any- 
time can  be  defined  by  (in,  P)  where  m is  the  maximum  offe  r that  has 
been  received  up  to  that  time  and  V is  the  posterior  probahilitv 
vector  (given  all  offers  up  to  that  time,  including  any  just  made)  of 
the  true  distribution.  Thu  optimality  equation  t. ..... -s  tin  fotin 


V(m,  P)  - Maxim, 


- m 

/ V(m, 
.0 


cl  , 


lb 


wnere 


While  it  follows  from  its  definition  that  V(rc,  P)  is  an  in- 
creasing function  of  m for  fixed  P it  is  not  immediately  evident 
from  the  optimality  equation  that  if  the  offer  m is  accepted  when 
in  state  (m,  P)  then  the  offer  m1  is  also  accepted  when  in  state 
(m\  jP)  whenever  m^  >_  m.  We  now  prove  this. 


Proposition  6: 

For  fixed  P,  V(m,  P)  - m is  a nonincreasing  function  of  m. 


Proof : 

Suppose  m,  < Note  that  the  distribution  of  the  sequence  of  future 

offers  is  the  same  no  matter  whether  the  initial  state  is  (m^ , P)  or 

' (since  it  only  depends  on  (x,  P)  through  P) . We  can  then  conclude 
that  if  the  initial  state  is  (m^ , ?)  then  by  following  throughout 
the  optimal  policy  for  the  initial  state  (m,,  P)  that  our  return 
when  we  stop  is  within  of  what  it  would  have  been  if  the 

initial  state  was  really  (m2,  P) . Therefore, 


V(nij  , P)  + m9  - in j > V(m0 , P)  . 

Corel lary  2 : 

If  it  is  optimal  to  accept  wtien  in  state  (m^,  P) 

it  is  optimal  to  accept  m9  when  in  state  (m7,  I’)  whenever 
m7  >_  in^. 

Proof : 

If  Vfm^,  P)  = m^  then  from  Proposition  6 
V(m?,  P)  - m?  < 0 

This  implies,  from  the  optimality  equation,  that 

V(m2,  P)  = m.,  . 


Propo sit ion  7 : 

For  fixed  m,  V(m,  P)  is  a convex  function  of  P. 


Proof : 

The  proof  is  identical  to  the  proof  of  Proposition  4 in 
in  Section  2. 


II 


then 
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Corollary  3 : 


hL 


V(m,  P)  _<  ZP^  Max(m,  x^) 


where  x.  = xT 


Proof : 


Letting  e^  be  the  vector  of  zero's  with  a one  in  the  ith 


place  then 


V(m,  e. ) 


m if  m > x . 

1 


if  u < x . . 

i 


Hence,  from  convexity 


V(m,  P)  < ZP.  V(m,  e.)  = ZP . Max{m,  x.}  . 
— — i l .1  i 


Proposition  8: 

If  the  present  state  is  (m,  P)  then 


(i)  if  m > ZP^  Max(m,  ) then  it  is  optimal  to  .accept  m. 


(ii)  if 


m < Z P . 


r-  00 

/ ydF^ (y ) - c 
m 


1 - F.(ra) 
1 


then  it  is  optimal  to  look  at  another  offer 


19 


■ 


(iii)  if 


m < EP.ImF.(m)  + / ydF.(y) 

i L ' 111  1 J 

Lhen  it  is  optimal  to  look  at  another 


c 

o f 1 e r . 


Proof : 



Part  (i)  follows  directly  from  Corollary  3,  while  the  proofs 
of  parts (ii)  and  (iii)  are  identical  to  their  corresponding  results 
in  Proposition  5 of  Section  1. 

Suppose  now  that  n = 2 and  x^  x0.  In  this  case  we  re- 
present the  state  by  (m,  P)  when  P is  the  posterior  probability 
that  is  the  true  distribution. 

Theorem  3: 

Vfm,  P)  is  increasing  in  P for  fixed  m. 


Proof . 

As  in  the  corresponding  proof  of  the  previous  section  we  need  to 
show  that 


Now 


V(m,  0)  < V (m , P)  . 


V(m,  0)  = Max(m,  x^)  . 


However , 


V(m,  P)  • m , 


and  as  it  follows  from  Part  (ii)  of  Proposition  8 that  it  is  never 
optimal  to  accept  an  offer  less  than  x^ , we  have 

V(«n,  P)  > x1  . 


That  is, 


m > Xj  =>  V(m,  P)  _>  x^ 
m < =*  V (m,  P)  = VCx^  P)  > x^  , 

and  the  proof  is  complete. 

Hence,  when  n = 2 and  x^  _<  x^,  it  is  optimal  to  accept  m 
when  in  state  (m,  P)  if  and  only  if  m _>  m(P),  where  m(P)  is  an 
increasing  convex  function  of  P with  m(0)  = x^,  m(l)  = x0  . 
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The  model  id  considered  both  in  t lie  case  where  uni  > jii  offer  i s 
reyected  it  may  not  be  recalled  at  a later  time,  and  in  the  case 
where  such  recall  of  previous  offers  is  allowed.  — . 

In  Section  1 we  show  how  bounds  on  the  optimal  policy  may 
be  obtained  when  some  partial  information  about  F is  available. 

n particular,  we  show  that  if  F,  the  distribution  of  offers, 
satisfies  the  NWUE  (new  worse  than  used  in  expection)  property- 
defined  as 

E_ [X  - ajx  > a]  > E [X]  for  all  a > 0, 
r F 

then  the  optimal  policy  has  a monotonic  relationship  with  the 
optimal  policy  in  the  case  where  the  distribution  of  offers  is 
exponential  with  the  same  mean  as  F. 

In  Sections  2 and  3 we  consider  a Bayesian  version  of  this 
model  by  supposing  that  F fs  known  to  be  one  of  the  distributions 
FjL>  F2’  •••>  Fn  with  given  initial  prior  probabilities.  In 
Section  2 we  do  not  allow,  and  in  Section  3 we  do  allow,  the  recall 
of  old  offers.  In  both  cases  we  provide  bounds  on  the  optimal  policy 
in  terms  of  the  optimal  policies  in  the  case  where  it  is  known 
which  of  the  F^.  is  equal  to  F.  This  Bayesian  format  has  previously 

been  considered  whei<  it  was  assumed  that  F was  a normal  random 
variable  with  known  variance  and  imposed  a normal  prior  distribution 
on  the  mean  of  F.  As  our  model  imposes  no  parametric  condition  of  F 
in  the  prior  distribution,  the  type  of  results  we  obtain  are  somewhat 


different. 


